MetaGeneAnnotator: Detecting Species-Specific Patterns of Ribosomal Binding Site for Precise Gene Prediction in Anonymous Prokaryotic and Phage Genomes
نویسندگان
چکیده
Recent advances in DNA sequencers are accelerating genome sequencing, especially in microbes, and complete and draft genomes from various species have been sequenced in rapid succession. Here, we present a comprehensive gene prediction tool, the MetaGeneAnnotator (MGA), which precisely predicts all kinds of prokaryotic genes from a single or a set of anonymous genomic sequences having a variety of lengths. The MGA integrates statistical models of prophage genes, in addition to those of bacterial and archaeal genes, and also uses a self-training model from input sequences for predictions. As a result, the MGA sensitively detects not only typical genes but also atypical genes, such as horizontally transferred and prophage genes in a prokaryotic genome. In this paper, we also propose a novel approach for analyzing the ribosomal binding site (RBS), which enables us to detect species-specific patterns of the RBSs. The MGA has the ingenious RBS model based on this approach, and precisely predicts translation starts of genes. The MGA also succeeds in improving prediction accuracies for short sequences by using the adapted RBS models (96% sensitivity and 93% specificity for 700 bp fragments). These features of the MGA expedite wide ranges of microbial genome studies, such as genome annotations and metagenome analyses.
منابع مشابه
Identification of a Specific Pseudo attP Site for Phage PhiC31 Integrase in Bovine Genome
Background: PhiC31 integrase system provides a new platform in various felid of research, mainly in gene therapy and creation of transgenic animals. This system enables integration of exogenous DNA into preferred locations in mammalian genomes, which results in robust, long-term expression of the integrated transgene. Objectives: Identification of a novel pseudo attP site. Materials and Methods...
متن کاملStarts of bacterial genes: estimating the reliability of computer predictions.
Exact mapping of gene starts is an important problem in the computer-assisted functional analysis of newly sequenced prokaryotic genomes. We describe an algorithm for finding ribosomal binding sites without a learning sample. This algorithm is particularly useful for analysis of genomes with little or no experimentally mapped genes. There is a clear correlation between the ribosomal binding sit...
متن کاملIdentification of a Specific Pseudo attP Site for Phage phiC3 Integrase in the Genome of Chinese Hamster in CHO-K1 Cell Line
Background: PhiC31 integrase is a DNA site-specific recombinase integrates DNA into the chromosomes between the two sites of attB and attP. Several pseudo attPs have been identified in mammalian genomes with critical features for long-term expression of transgene. In this manuscript, we report a novel intrinsic pseudo attP site named CHOL1 in the Chi...
متن کاملIn silico investigation of lactoferrin protein characterizations for the prediction of anti-microbial properties
Lactoferrin (Lf) is an iron-binding multi-functional glycoprotein which has numerous physiological functions such as iron transportation, anti-microbial activity and immune response. In this study, different in silico approaches were exploited to investigate Lf protein properties in a number of mammalian species. Results showed that the iron-binding site, DNA and RNA-binding sites, signal pepti...
متن کاملGeneMarkS-2: Raising Standards of Accuracy in Gene Recognition
Motivation: Ab initio gene prediction in prokaryotic genomes is supposed to be so accurate that RNASeq data are rarely produced to bring in an additional layer of evidence. In 2016 more than 60,000 prokaryotic genomes were re-annotated by the NCBI pipeline. Given the sheer volume of prokaryotic DNA data flowing from next generation sequencing facilities into genome databases, the annotation acc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes
دوره 15 شماره
صفحات -
تاریخ انتشار 2008